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Abstract 

We prove an inequality for the entropy numbers in terms of non- 
linear Kolmogorov's widths. This inequality is in a spirit of known 
inequalities of this type and it is adjusted to the form convenient in 
applications for m-term approximations with respect to a given sys- 
tem. Also, we obtain upper bounds for the m-term approximation by 
the Weak Relaxed Greedy Algorithm with respect to a system which 
is not a dictionary. 

1 Introduction 

This paper was motivated by the very recent paper [3]. The authors of [3] 
study the entropy and best m-term approximation of the £ g -hulls of finite 
systems of elements in the L p spaces. They conduct this study by probabilis- 
tic methods. In this context probabilistic methods were used in some earlier 
papers, for instance, in j2]. Here we demonstrate how known results from 
greedy approximation in Banach spaces combined with known technique of 
general inequalities for the entropy numbers allow us to obtain similar re- 
sults. Moreover, we show that the use of a greedy algorithm allows us to 
provide a deterministic construction of good m-term approximants. 
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A number of different widths are being studied in approximation theory: 
Kolmogorov widths, linear widths, Fourier widths, Gel'fand widths, Alexan- 
drov widths and others. All these widths were introduced in approximation 
theory as characteristics of function classes (more generally compact sets) 
which give the best possible accuracy of algorithms with certain restrictions. 
For instance, Kolmogorov's n-width for centrally symmetric compact set F 
in a Banach space X is defined as follows 

d n (F,X) : = inf sup inf ||/ - g\\ x 

where inf^ is taken over all n- dimensional subspaces of X. In other words 
the Kolmogorov n-width gives the best possible error in approximating a 
compact set F by n-dimensional linear subspaces. 

There has been an increasing interest last decades in nonlinear m-term 
approximation with regard to different systems. In [I] we generalized the 
concept of classical Kolmogorov's width in order to use it in estimating best 
m-term approximation. For this purpose we introduced a nonlinear Kol- 
mogorov's (N, m)-width: 

d m (F,X,N) := inf sup inf inf ||/ - g\\ x , 
L n ,#L n <n feF lgLjv » eL 

where L^r is a set of at most N m-dimensional subspaces L. It is clear that 

d m (F,X,l)=d m (F,X). 

The new feature of d m (F, X, N) is that we allow to choose a subspace L Eh N 
depending on / G F. It is clear that the bigger N the more flexibility we have 
to approximate /. It turns out that from the point of view of our applications 

the two cases 

N x K m , (1.1) 

where K > 1 is a constant, and 

N x m am , (1.2) 

where a > is a fixed number, play an important role. 

It is known (see [6]) that the (N, m)-widths can be used for estimating 
from below the best m-term approximations. Let A be a Banach space and 
let Bx denote the unit ball of X with the center at 0. Denote by Bx{y, r) a 
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ball with center y and radius r: \\x — y\\ < r}. For a compact set A 

and a positive number e we define the covering number N e (A,X) as follows 

N e (A,X) := min{n : 3^, . . . ,y n : A C U^ =1 B x (y^, e)}. 

It is convenient to consider along with the entropy X) := logiV e (^4, X) 

(here and later log := log 2 ) the entropy numbers e k (A,X): 

e k (A,X) := inf{e : By 1 , . . . ,y 2 " € X : A C uf =1 B x (yi , e)}. 

There are several general results (see pQ) which give lower estimates of 
the Kolmogorov widths d n (F,X) in terms of the entropy numbers e^(F, X). 
The Carl's (see pQ) inequality states: for any r > we have 

max k r e k (F,X) < C(r) max m r d m _!(F,X). (1.3) 

l<fc<n l<m<n 

We proved in [1] (see also [7], Section 3.5) the inequality 

max k r e k (F,X) < C(r,K) max m r d m _!(F, X, K m ), (1.4) 

l<fc<n l<m<n 

where we denote 

do(F,X,N) := sup H/IU. 

This inequality is a generalization of inequality ( 1 1.3ft . We also discussed in 
[1] and in Section 3.5 of [7J the possibility of replacing K m by (Kn/m) m in 
(j 1.4ft . The corresponding remarks (Remark 2.1 in [4] and Remark 3.5 in [7J) 
should read as follows. 

Remark 1.1. Examining the proof of ljl-4\ ) on ^ can check that the following 
inequality holds 

n r e n (F,X) < C(r,K) max m r d m _i(F, X, (Kn/m) m ). 

l<m<n 

In Section 2 we prove an upper bound for e k {F, X) for all k < n. 

In Section 3 we demonstrate how the general inequality from Theorem 
12.11 can be used in estimating the entropy numbers of different compacts. 
In particular, Corollary 13.31 gives a new proof of the corresponding upper 
bounds from Theorem 1 in [3]. 

In Section 4 we study the Weak Relaxed Greedy Algorithm with respect 
to a system which is not a dictionary. In particular, results of Section 4 
provide an algorithm which gives the same upper bounds for the best m- 
term approximation as those obtained in [3]. 
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2 A general inequality 



Theorem 2.1. Let a compact F C X and a number r > be such that for 
some neN 

d m _i(F,X, (Kn/m) m ) < m~ r , m < n. 

Then for k < n 

£tW x)< c( ^)(M) r . 

Proof. Let X(N, m) denote the union of not more than N subspaces L with 
dimL < m. Consider a collection K{1) := {X{(Kn2~ s - l f s+ \ 2 s+1 )}i =1 , 
2 l+1 < n and denote 

H r {K{l)) : {/( .V: :/.;(/) U{f) : L s (/) G X({Kv2— , 2 S+1 ), 

and 3i s (/) e -^ s (/) such that 

z 

||t s (/)IU<2-^- 1 ), 5 = 1,. ||/-^t s (/)||x<2- ri }. 

s=l 

Lemma 2.1. VFe /iave for r > 

e 2 * (H r (IC(l)),X) < C(r, K)2~ rl (log(Kn2- l )) r , 2 m < n. 

Proof. We use a well known result (see, for instance, [7], p. 145) to estimate 
e n (Bx,X) of the unit ball Bx in the <i-dimensional space X : 

e n (B x ,X)<3(2- n/d ). (2.1) 

Take any sequence {n s }^ of l(r) < I — 2 nonnegative integers. We will 
specify l{r) later. Construct e ns -nets consisting of 2 n " points each for all unit 
balls of the spaces in X((Kn2- s ' 1 ) 2 , 2 S+1 ). Then the total number of the 
elements in these e ns -nets does not exceed 

M s := (Kn2- s - 1 f s+1 2 n \ 
We now consider the set A of elements of the form 

yj + 2-yJ + ■ ■ • + j.G[l,MJ, * = 1, . . . , J(r). 
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The total number of these elements does not exceed 

l(r) l(r) l(r) 

M = Y[M S , hgM < ^2 s+1 log(irn2- s - 1 ) + J]n s . 

s=l s=l s=l 

It is easy to see that 

J(r) 

2 S+1 log(ATn2- 8 - 1 ) < Ci2 /(r) log(iTn2-' (r) ). 

8=1 

We now set 

n s := [(r + - s)2 s+1 ], s = 1, . . . , Z(r), 

where [x] denotes the integer part of a number x. We choose Z(r) < / — 2 as 
a maximal natural number satisfying 

i(r) 



8=1 

and 



< 2' 

8=1 

Ci2' (r) log(iTn2- i(r) ) < 2' -1 . 



It is clear that 

2 l(r) > C 2 2 l (\og(Kn2- 1 ))- 1 . (2.2) 

Then we have 

M < 2 2 '. 

For the error e(f) of approximation of / G H r (K.(l)) by elements of A we 
have 

e(f)<2-r l + J2\\ts(f)-2- r{s - 1) y S Jx+ £ ||f.(/)|U 

s=l s=J(r)+l 
I(r) 

< C(r)2- ri « + ^2-^- 1 ) ena (5 Ls{/) ,X) 



s=l 

J(r) 



< C(r)2- H(r) + 3^2- r(s - 1) 2- ns/2S+1 < C(r)2" 



rl(r) 



s=l 



Taking into account f)2.2p we complete the proof of Lemma 12.11 □ 



We continue the proof of Theorem 12. II Without loss of generality assume 
max m r d m _i(F,X, (Kn/m) m ) < 1/2. 

l<m<n 

Then for s = 1, 2, ...,/;/ < [log(n — 1)] we have 

d 2 s(F,X, (Kn2- S ) 2a ) < 2~ ra ~ l . 

This means that for each s — 1,2, ... ,1, there is a collection L s of (Kn2~ s ) 2S 
2 s -dimensional spaces Lpj = 1, . . . , (Kn2~ s ) 2S , such that for each / £ F 
there exists a subspace L s - s {f) and an approximant a s (f) £ Lj g (f) such that 

||/-a.(/)||<2-"- 1 . 

Consider 

t.(/) -a.C/O-a^xC/), s = 2,...,Z. (2.3) 

Then we have 

*.(/) e © J£i(/)> dim ( L K/) © < 2 s + 2 s - 1 < T +1 . 

Note that for K large enough 

{Kn2- s ) 2 \Kn2- s+1 f 8 ' 1 < (#n2— 1 ) 2 ' +1 . 

Let X((Kn2~ s ~ 1 ) 2a+1 , 2 S+1 ) denote the collection of all L^ffiL^"* over various 
1 < j s < (Kn2- s ) 2S ; 1 < j 8 _ x < {Kn2~ s+1 f a '\ For t s (f) defined by (TO) 
we have 

||f«(/)ll < 2 _rs ~ 1 + 2- r(s " 1) - 1 < 2~ r(s ~ 1) . 
Next, for a\(f) £ L l (f) we have 

||/- fll (/)||<l/2 

and from do(F,X) < 1/2 we get 

IK(/)ll<i- 

Take t\(f) = ax(f). Then we have F C H r (JC(l)) and Lemma I2~TI gives the 
required bound 

e 2 i(F) < C(r,K)2- rl (log(Kn2- l )) r , l<l< [log(n - 1)]. 

It is clear that these inequalities imply the conclusion of Theorem 12.11 □ 
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3 Applications 

We begin with an application which motivated a study of d m (F,X,N) with 
N = (Kn/m) m . Let T> = {<7j}" = i be a system of normalized elements of car- 
dinality [D\ = n in a Banach space X. Consider best m-term approximations 
of / with respect to T> 



v )x-= s mf II / - XI c ^ I 

{cj};A:|A|=m ^— ■ ' 



For a function class F set 

a m (F,V) x := sup <r m (/, £>)x- 

Then it is clear that for any system D, |X>| = n, 

d m {F,X, ())< a m (F,V) x . 

Next, 

' ' n 

Thus Theorem 12.11 implies the following theorem. 



< (en/my 



Theorem 3.1. Let a compact F C X be such that there exists a normalized 
system T>, \T>\ = n, and a number r > such that 

a m (F, V) x < m~ r , m < n. 

Then for k < n 

*W*)<C(r)p^)'. (3.1) 

Remark 3.1. Suppose that a compact F from Theorem \ 3.1\ belongs to an 
n-dimensional subspace X n := span(P). Then in addition to $3. 1\) we have 
for k > n 

e k (F,X)<C(r)n- r 2-^ n . (3.2) 

Proof. Inequality (I3.2p follows from Theorem 13.11 with X = X n , k = n, 
inequality (12. ip and a simple well known inequality 

(A,X n ) < e kl (A,X n )e k2 (B Xn ,X n ), (3.3) 

where A is a compact and E>x n is a unit ball of X n . □ 
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As a corollary of Theorem 13.11 and Remark 13.11 we obtain the following 
classical bound. 

Corollary 3.1. For any < q < oo and max(l, q) < p < oo we have 

( / log(2n/fc) xl/ q -l/ p 7 < 
6 fe (^,^)<C( g) p)j ^ nl/ i_ 1/fl ' 



n. 



Proof. Indeed, it is well known and easy to check that for a sequence of 
nonnegative numbers X\ > X2 > • • • > x n we have for < q < p 

n \ VP / n \ V<2 

^ ^ <mH (X>?) ■ (3.4) 

Vj=m+1 / \j=l / 

Therefore, for < q < p 

(?m{Bq, {ej}] =1 )in < mp~«, m<n, 

where {e,,}™ =1 is a canonical basis for R™. Applying Theorem 13.11 and Remark 
13.11 we obtain Corollary 13.11 □ 

For a normalized system T> define A q (T>), q > 0, as a closure in X of the 

set 

{x:x = J2c j9j , gj EV,J2 \ c s\ 9 ^ !}• 
j j 

Corollary 3.2. Let 1 < p < oo. For a normalized system T> of cardinality 
\T>\ = n we have 



log(2n/k) 



rnaxfi -)-l 

v 2 ' p > 



e k {A x (V), L p ) < C(p) ^ ° K k ' ' J , k < n. (3.5) 

Proof. It is known (see [2] and [7]) that 

a m {A x {V),V) Lp < C(p)m™ x ^-\ (3.6) 

It remains to apply Theorem 13.11 □ 

Corollary 3.3. Let V be a normalized system of cardinality [D\ = n. Then 
for < q < 1 and 1 < p < oo we have 



e k (A q (V),L p )<C(q,p) 



-log(2n/fe)^l/ g -max(i i) 



'p , k < n 

2 _ fc/nn max(i,i)-l/ g fc > ^ 
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Proof. We estimate a m (A q (V),V) Lp . If q = 1 then the bound is given by 
(13.61) . If q < 1 then we use ( 13 .4p with p = 1 and by (13.61) we get 

a 2m (A q (V),V) Lp < C(g,p)m max( MK. 

Applying Theorem 13.11 and Remark 13.11 we obtain Corollary 13.31 □ 

We note that Corollary 13.31 gives the same upper bounds as in Theorem 1 
of [3]. It is proved in [3] that these bounds are best possible up to a constant. 

4 A greedy algorithm 

In Section 3 we showed how best m-term approximations can be used for 
estimating the entropy numbers. Here we note that m-term approximations 
are very important by themselves in the context of sparse approximation. In 
this context an important problem is to provide an algorithm that builds a 
good m-term approximation. We discuss a greedy algorithm in this section. 
The theory of greedy approximation is well developed (see |7J). A typical 
problem of greedy approximation is a problem of m-term approximation with 
respect to a dictionary. We say that a set of elements (functions) T> from a 
Banach space X is a dictionary, respectively, symmetric dictionary, if each 
g ET> has norm bounded by one (||g|| < 1), 

g G V implies — g G V, 

and the closure of spanP is X. We denote the closure (in X) of the convex 
hull of T> by Ai(T>). In this section we discuss greedy algorithms with regard 
to a system T> that is not a dictionary. Here, we will discuss a variant of the 
Weak Relaxed Greedy Algorithm (WRGA). Let X be a real Banach space 
and let T> := {g} be a system of elements g G X such that \\g\\ < 1 and 
g G T> implies —g G T>. Usually, in the theory of greedy algorithms we 
consider approximation with regard to a dictionary T>. One of the properties 
of a dictionary V is that the closure of spanD is equal to X. In this section 
we do not assume that the system V is a dictionary. In particular, we do 
not assume that the closure of spanD is X. This setting is motivated by 
applications in Learning Theory (see Chapter 4 of (?]). 

For a nonzero element / G X we let Ff denote a norming (peak) func- 
tional for /: 

11^11 = 1, *>(/) = ||/||. 
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The existence of such a functional is guaranteed by Hahn-Banach theorem. 

Let r := {ife}^ be a given weakness sequence of numbers % G [0,1], 
k = l, 

Weak Relaxed Greedy Algorithm (WRGA). We define / r := f Q ' T := 

/ and G r Q := Gq T := 0. Then, for each m > 1 we have the following inductive 
definition. 

(1) {p r m := G D is any element satisfying 

(2) Find < A m < 1 such that 

||/ - ((1 - X m )G r m __ 1 + A m¥ 4)|| = Q inf 1 1|/ - ((1 - \)G r m _ x + A</4)ll 
and define 

(3) Let 

rr rr,r r /~ir 

Jm ' Jm " J m' 

For a Banach space X we define the modulus of smoothness 
p(u) ■= ^ sup + + ||ar - uy\\) - 1). 

The uniformly smooth Banach space is the one with the property 

lim p(u)/u = 0. 

The following theorem was proved in [5] (see also Theorem 6.17 on p. 348 in 
[7]) for V being a dictionary. 

Theorem 4.1. Let X be a uniformly smooth Banach space with modulus of 
smoothness p{u) < •yu q , 1 < q < 2. Then, for a sequence r := {tsJfcLi; 
£fc < 1, k — 1, 2, . . . , we have for any f G Ai(V) that 

i/p 

q 



\\f r m T \\ < 7) i + » V- 



q 



k=l 

with a constant C x (q, 7) which may depend only on q and 7. 
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We prove here an analog of the above theorem when we do not assume 
that T> is a dictionary and only assume that T> = {g} is a symmetric system 
with a property \\g\\ < 1. 

Theorem 4.2. Let X be a uniformly smooth Banach space with modulus of 
smoothness p(u) < r yu q , 1 < q < 2. Then, for a sequence r := {tk}™ =1 , 
tk < 1, k — 1, 2, . . . , we have for any f e X that 

i/p 



a constant (^(g, 7) which may depend only on q and 7. 

Remark 4.1. In case of a Hilbert space H there are stronger results for 
similar greedy algorithms with r = {1} (see £?]/, p. 99, Theorem 2.28): 

\\f m \\ 2 H < ( inf .\\f-4>\\ H ) +Cm-\ 

Proof. Proof of Theorem 14.21 is similar to the proof of Theorem 14. 1L Denote 

b:= inf 

We use the following lemma. 

Lemma 4.1. Let X be a uniformly smooth Banach space with modulus of 
smoothness p(u). Then, for a given f G Ai(D) we have 

<J"f, (11^11-^(11/^x11-6) 



0<A<1 



+2\\f^A\p 
Proof. We have 

fm := f - ((! ~~ A m )GJ n „ 1 + A m ^) = f m _ x - X m [<f r m - G r m -i) 

and 

\\r m \\= inf 11/^ -A^-G^H. 

(J^ A \ 1 
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We have from the definition of the modulus of smoothness for any A 

\\f r m -l - K<Pm - G r m -l)\\ + Wfm-l + KVL - G 7 m -l)\\ < 

2il/;- 1 ||(l+p( A|l ^~ G r l11 ))- 

WTm-l II 

Next we get for A > 

\\f m -X + Ktin ~ G r m-l)\\ > Ff^fm-l + Wn ~ d)) = 

Wfm-xW + ^t^Mn ~ Cm-i) > \\f r m-i\\ + \t m sup F fmi (g - G r m 

Using Lemma 6.10, p. 343, from [7J we continue 

= \\r m -i\\ + tt m sup F/^^-^O^II/m-xll + A^dl/^xll 
^eAi(x>) 



Using the trivial estimate ||<^„ — G^-ill < 2 we obtain from ( 14.1 

\\fm-l ~ Kfm ~ G m~l)\\ 

2A 

< \\f m -i\\ -AUII/;-ill -6) + 2||/L- 1 ||p(||7^-i|)), 

II Jm-l II 

which proves Lemma [4.11 
Set 

Note that 

< a m < 2. 

Using monotonicity of p(u)/u we derive from Lemma [4.11 

inf (1 — \t m + 2p(2A/a m _i)). 

A£[0,1] 

For p{u) < , yu q it gives 

(X m ^ Qj m —\ inf (1 — Xt m + 27(2A/a m _i) 9 ). 

AG[0,1] 

Denote Ai the solution of the equation 
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If Ai < 1 then 



inf (1 - Xt m + 2 7 (2A/a m _ 1 ) 9 ) < 1 - X 1 t m + 2 7 (2A 1 /a m _ 1 ) 9 ) 

Ag[0,l] 

= 1 - -Ai* m = 1 - C 3 (g, 7)C a m-l; P :z 



^■•x-iii - —oyni i j mm— li r— y 

If Ai > 1 then for all A < Ai we have 

1, ( 2A 

-Xt m > 27 



\a>m-x 
Specifying A = 1 we get 

inf (1 - Xt m + 2 7 (2A/a m _ 1 )< ? ) <l-\t m <l- C A (q, 7 )C«m-i- 
ag[o,i] 2 

Setting C 5 := C 5 (q, 7) := min(C3(g, 7), C 4 (g, 7)) we obtain 

(l-C 5 «-i)- (4-5) 

It is known (see [7], p. 345) that inequalities (14. 5 p imply 

m \ Vp 
,,„<Q i (g, 7 )(l + ^ 

n=l 

This completes the proof of Theorem 14.21 □ 

It is known (see, for instance, [2], Lemma B.l) that in the case X = L p 
we have 

p(u)<u p /p if l<p<2 and < (p - 1)m 2 /2 if 2 < p < 00. 

Therefore, in this case Theorem 14.21 gives: for any / 6 L p 

ti J 

where s := max(^-, 2). It was proved in [3] that for < v < 1, 

^(/,2?)^ < inf 11/ - + C(p)m™< l /p,i/2)-i/«. (4.7) 

The proof in [3] is probabilistic and does not provide a deterministic algo- 
rithm for constructing a good m-term approximation. We note that inequal- 
ity (14. 6p shows that in case v — 1 the greedy algorithm WRGA with r = {t} 
provides the rate of approximation as in (14. 7p . 
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